Mixed Bayesian Networks with Auxiliary Variables for Automatic Speech Recognition
نویسندگان
چکیده
In standard automatic speech recognition (ASR), hidden Markov models (HMMs) calculate their emission probabilities by an artificial neural network (ANN) or a Gaussian distribution conditioned only upon the hidden state variable. Recent work [12] showed the benefit of conditioning the emission distributions also upon a discrete auxiliary variable, which is observed in training and hidden in recognition. Related work [3] has shown the utility of conditioning the emission distributions on a continuous auxiliary variable. We apply mixed Bayesian networks (BNs) to extend these works by introducing a continuous auxiliary variable that is observed in training but is hidden in recognition. We find that an auxiliary pitch variable conditioned itself upon the hidden state can degrade performance unless the auxiliary variable is also hidden. The performance, furthermore, can be improved by making the auxiliary pitch variable independent of the hidden state.
منابع مشابه
Modeling auxiliary information in Bayesian network based ASR
Automatic speech recognition bases its models on the acoustic features derived from the speech signal. Some have investigated replacing or supplementing these features with information that can not be precisely measured (articulator positions, pitch, gender, etc.) automatically. Consequently, automatic estimations of the desired information would be generated. This data can degrade performance ...
متن کاملAn Introduction to Bayesian Networks for Automatic Speech Recog
Bayesian Networks are a particular type of Graphical Models, providing a general and flexible framework to model, factor, and compute joint probability distributions among random variables in a compact and efficient way. For speech recognition, a BN permits each speech frame to be associated with an arbitrary set of random variables. They can be used to augment well-known statistical paradigms ...
متن کاملBayesian network structures and inference techniques for automatic speech recognition
This paper describes the theory and implementation of Bayesian networks in the context of automatic speech recognition. Bayesian networks provide a succinct and expressive graphical language for factoring joint probability distributions, and we begin by presenting the structures that are appropriate for doing speech recognition training and decoding. This approach is notable because it expresse...
متن کاملDynamic Bayesian Networks for Multi-Dialect Isolated Arabic Recognition
Hidden Markov Models (HMM) are currently widely used in Automatic Speech Recognition (ASR) as being the most effective models. In addition, the HMM are just a special case of graphical models which are dynamic Bayesian Networks (DBN). These are modeling tools more sophisticated because they allow to include several specific variables in the problem of automatic speech recognition other than the...
متن کاملInvestigating Mixed Discrete/Continuous Dynamic Bayesian Networks with Application to Automatic Speech Recognition
Notation s t The state of a discrete (switch) hidden variable at time t h t The state of a continuous hidden variable at time t o t A feature vector at time t v t A sample of the speech signal at time t x 1:T Shorthand for x 1 , x 2 ,. .. , x T φ A particular setting of the HMM parameters
متن کامل